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Installing and Configuring the Foundation Monitor Toolkit 


Installing and Configuring the 
Foundation Monitor Toolkit 


The Foundation Monitor toolkit reports on the degree of availability 
provided by enterprise cluster components. The Foundation Monitor 
toolkit logs the start and end of downtime events together with relevant 
messages. A cluster availability report is generated if downtime events 
occur—that is, if cluster nodes, packages, or network components become 
unavailable. 

This document explains the basic operation of the Foundation Monitor 
toolkit. Topics are as follows: 

• Understanding the Foundation Monitor Toolkit 

• Installing the Foundation Monitor Toolkit 

• Setting Up the Foundation Monitor Toolkit 

• Sample Files 
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Figure 0-1 


Table 0-1 


Installing and Configuring the Foundation Monitor Toolkit 

Understanding the Foundation Monitor Toolkit 


Understanding the Foundation Monitor 
Toolkit 

The Foundation Monitor toolkit monitors the status of mission critical 
environments similar to the one shown in Figure 0-1. 


High Availability Cluster Environment 

nodel node2 



Table 0-1 lists the types of resource monitoring provided by the 
Foundation Monitor toolkit. 

Resources Monitored by the Foundation Monitor Toolkit 


Availability of... 

Foundation Monitor Toolkit Action 

Nodes 

Measures the total time the cluster node is available for 
use. Determines the most likely cause of any downtime 
event and logs the event message. 

Packages 

Logs the start and stop times of all packages. 

Determines the root cause of any downtime event and 
logs the event message. 

Network Components 

Logs the start and end of any network downtime event. 


6 


Chapter 


















































































































































Installing and Configuring the Foundation Monitor Toolkit 

Understanding the Foundation Monitor Toolkit 


A single, daily, summary report on cluster availability is generated once 
per day if one or more downtime events occur on any of the resources 
listed in Table 0-1. A report is not generated if the status of the 
monitored items does not change. For example: 

• If a node goes down, an event, then a report is generated. 

• If the node stays down over the next 24 hours, an additional event is 
not generated and neither is an additional report. 

• When the node returns to up status, an event, then a report is 
generated. 

Refer to the following sections for additional information. 

Cluster Node Availability Tracking 

Use the Foundation Monitor toolkit cluster node availability tracking 
information to determine the cause of system reboots on any node in the 
cluster after any of the following events: 

• shutdown 

• panic 

• power failure 

• halt 

The Foundation Monitor toolkit records cluster node availability as the 
total amount of uptime or downtime for each node in the cluster: 

• Total node uptime is measured from the time the node was booted to 
the last time recorded in the . time_stamp file. 

• Total node downtime is measured from the last time recorded in the 
. time_stamp file to the boot time. 

The Foundation Monitor toolkit uses the following files to record and 
track node availability: 

• A time stamp file—/home/hamon/. time_stamp 

It is updated with the system clock time every 60 seconds. 

• The summary report—/hamon/hamon/reports/cluster_faults 

The following is a sample summary report. It includes node status 
and node name: 
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Understanding the Foundation Monitor Toolkit 


Table 0-2 


Node Name: 

Down Time Start: 
Up Time Start: 
Total Down Time: 
Event Message: 


monk 

May 27, 1999 09:08:10 
May 27, 1999 09:14:32 
0 : 0 : 6:22 
SHUTDOWN 


• A crontab file—/var/spool/cron/crontabs/root 

It specifies when in the day the cluster_faults summary report is 
produced. The cron file must contain the following line: 

55 23 * * * /opt/cmcluster/toolkit/foundation/monitor cron 

• A log file—/home/hamon/data 

It stores the type of reboot event, node uptime, and node downtime 
whenever a reboot occurs. The following is a sample log file entry: 

SYS:12647290:12647672:monk:down:May:27:1999:09:08:10:up:May 
:2 7:1999:09:14:32:0:0:6:22:"SHUTDOWN" 

Where— Label 0 starts on the left, each label field is iterated across 
the record and separated by a colon (:). The /home/hamon/data log 
file has 23 system related label fields as listed in Table 0-2: 


Log File Fields for System Data Log 


Label # 

Field Name 

Comments 

label 0 

SYS 

used to let the script know it was a 
system related event 

label 1 

time down 

calculated in seconds 

label 2 

time up 

calculated in seconds 

label 3 

node name 

name of the node where the event 

occurred 

label 4 

status down 


label 5 

month down 


label 6 

date down 


label 7 

year down 


label 8 

hour down 
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Installing and Configuring the Foundation Monitor Toolkit 

Understanding the Foundation Monitor Toolkit 

Log File Fields for System Data Log (Continued) 


Label # 

Field Name 

Comments 

label 9 

minute down 


label 10 

second down 


label 11 

status up 


label 12 

month up 


label 13 

date up 


label 14 

year up 


label 15 

hour up 


label 16 

minute up 


label 17 

second up 


label 18 

# of day down 


label 19 

# of hour down 


label 20 

# of minute down 


label 21 

# of second down 


label 22 

cause of event 

options are: Panic, Halt, Reboot, 
and Power 
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Understanding the Foundation Monitor Toolkit 

Package Availability Tracking 

Use the Foundation Monitor toolkit package availability tracking to 
determine the cause of package starts or halts issued by any of the 
ServiceGuard commands on any node in the cluster. 

Package starting or halting is issued if one or more of the following 
events occur: 

• node reboot 

• package failure 

• authorized person executes a ServiceGuard command 


Related ServiceGuard commands are listed in Table 0-3: 

ServiceGuard Package Halt and Start Commands 


Command 

Place package in state 

cmruncl 

started 

cmrunnode 

started 

cmrunpkg 

started 

cmhaltcl 

halted 

cmhaltnode 

halted 

cmhaltpkg 

halted 


The Foundation Monitor toolkit uses the following files to record and 
track package status: 

• Package log file 

Determination of the package starting time or halting time is based 
on the package log file associated with the package name. 

• The summary report— /hamon/hamon/reports/cluster_faults 

The following is a sample report. It shows an example of package 
status for the package name ping_pkg: 

Package Name: ping_pkg 

Start Event Time: May 27, 1999 09:50:48 

Event Message: cmrunserv 

Error Message: NA 
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Table 0-4 


• A crontab file—/var/spool/cron/crontabs/root 

It specifies when in the day the cluster_faults summary report is 
produced. The cron file must contain the following line: 

55 23 * * * /opt/cmcluster/toolkit/foundation/monitor cron 

• A log file—/hamon/hamon/pkg_name/data 

It records the package status changed to or from halted or started 
during the previous 24 hours. The following is a sample log file entry: 

PKG:ping_pkg:12736248:up rmonk:May:27:1999:09:50:48:cmrunserv 
PKG:ping_pkg:12749295:down:monk:May:27:1999:13:28:15:cmhaltnode 

Where —Label 0 starts on the left side and each label field is 
separated by a colon (:). The /home/hamon/pkg_name/data log file 
has 12 package related label fields as listed in Table 0-4: 


Log File Fields for Package Data Log 


Label # 

Field Name 

Comments 

label 0 

PKG 

used to let the script know it was a 
package related event 

label 1 

package name 


label 2 

time up or time down 

in seconds 

label 3 

status down or status up 


label 4 

node name 


label 5 

month down or month up 


label 6 

date down or date up 


label 7 

year down or year up 


label 8 

hour down or hour up 


label 9 

minute down or minute up 


label 10 

second down or second up 
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Log File Fields for Package Data Log 


Label # 

Field Name 

Comments 

label 11 

cause of event 

options are: cmhaltcl, 
cmhaltnode, cmhaltpkg, 
cmruncl, cmrunnode, 
cmrunpkg, and cmmodpkg 


Network Availability Tracking 

Use the Foundation Monitor toolkit network availability tracking to 

monitor the availability of the networking connections. 

In High Availability network farms the network connectivity is from the 

server nodes to the client's wiring closet through the customers' intranet 

environment. The layers in the network farm are: 

• The Access layer switch (wiring closet) which handles the client 
connectivity to the access layer switch 

• The Distribution layer (Layer 3) switch for aggregating high-speed 
access from the client's (access layer) 

• The Core layer switch (backbone) is interfaced to the upstream 
distribution layer and to the downstream server distribution. 

• The Server distribution layer switch (data center) which handles the 
server connectivity to the network switch within the Local Area 
Network. 

Figure 0-2 illustrates the network layers. 
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Figure 0-2 
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The Foundation Monitor toolkit records and tracks network connection 
status through the Access Layer switch as follows: 

• The Foundation Monitor toolkit sends ping packets every 30 seconds 
from the server node to each of the Access Layer switches. 

This process verifies the continuous availability of network 
communication on every node in the cluster. 

A successful return of the packet marks the Access Layer switch 
status as up for the availability of network connectivity. 

• If the packet is not returned, then ping packets are sent to each client 
node that is associated with the Access Layer switch. 

— If one client node responds, then the Access Layer switch status is 
marked as up for the availability of network connectivity. 

— If none of the client nodes respond, then the Access Layer switch 
status is marked as down for inaccessible network connectivity. 
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Installing and Configuring the Foundation Monitor Toolkit 

Understanding the Foundation Monitor Toolkit 


The Foundation Monitor toolkit uses the following files to record and 
track network connection status: 

• The summary report—/hamon/hamon/reports/cluster_faults 

The following is a sample report. It shows an example of network 
switch status for the network switch name netswl: 

Network Name: netswl 

IP Address: 192.11.11.100 

Start Down Time: Aug 19, 1999 08:12:24 

Start Up Time: Aug 19, 1999 08:13:19 

• A crontab file—/var/spool/cron/crontabs/root 

It specifies when in the day the cluster_faults summary report is 
produced. The cron file must contain the following line: 

55 23 * * * /opt/cmcluster/toolkit/foundation/monitor cron 

• A log file—/hamon/hamon/network/data 

It records the network connection status of the Access Layer switch 
status whenever a down event occurs. The following is a sample line 
log file entry: 

NET:19987944:down:192.11.11.100:netswl:Aug:19:1999:08:12:24 
:up:Aug:19:1999:08:13:19 

Where— Label 0 starts on the left side and each label field is 
separated by a colon (:). The /home/hamon/network/data log file has 
19 Access Layer switch related label fields as listed in Table 0-5: 


Log File Fields for Network Data Log 


Label # 

Field Name 

Comments 

label 0 

NET 

used to let the script know it is a 
network related event 

label 1 

time down 


label 2 

status down 


label 3 

ip address 

address of the device that is down 

label 4 

switch name 
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Log File Fields for Network Data Log 


Label # 

Field Name 

Comments 

label 5 

month down 


label 6 

date down 


label 7 

year down 


label 8 

hour down 


label 9 

minute down 


label 10 

second down 


label 11 

status up 


label 12 

month up 


label 13 

date up 


label 14 

year up 


label 15 

hour up 


label 16 

minute up 


label 18 

second up 
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Installing the Foundation Monitor Toolkit 


Installing the Foundation Monitor Toolkit 

The Foundation Monitor toolkit is provided as part of the Enterprise 
Cluster Master Toolkit product. Install the Enterprise Cluster Master 
Toolkit on every node within the cluster that requires tracking of the 
cluster node, network, and package availability. 

Prior to installing the Foundation Monitor toolkit, ensure the system 
meets all of the prerequisites listed in Table 0-6: 


Foundation Monitor Toolkit Installation Perquisites 


Cluster Environment Requirement 

HP-UX Prerequisite 

HP9000 High Availability Server 

2 to 16 node 

LAN Interface 

Minimum 2 LAN card per cluster node 

High Availability Network Switch 

Minimum 6 network switch 

High Availability Network Routing 
Switch 

Minimum 2 network routing switch 

HP-UX Operating System 

HP-UX 10.20 or later 

MC/ServiceGuard 

Supported on HP-UX 10.20 or later 

Disk Space 

100 Mbytes for the /home/hamon 
directory 


To install the Enterprise Cluster Master Toolkit on HP-UX: 

1. Log on as root user on the target system. 

2. Mount the CD containing the Enterprise Cluster Master Toolkit. Use 
sam ( 1M) or mount ( 1M) to mount the disk. 

3. Execute swinstall. 

4. From the Software Selection window in swinstall, select the 
Enterprise Cluster Master Toolkit bundle: 

B5139BA, Enterprise Cluster Master Toolkit 

It is recommended that you install the entire bundle. To do this, use 
the Action:Mark For Install to mark the bundle for installation. 
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Setting Up the Foundation Monitor Toolkit 

To set up the Foundation Monitor toolkit: 

1. Ensure your system meets all the prerequisites listed in the following 
section. 

2. Execute the Foundation Monitor toolkit setup script. 

3. Verify the monitor processes are running on each node in the cluster. 

Foundation Monitor Toolkit Prerequisites 

Prior to executing the Foundation Monitor toolkit setup script, 
prerequisites for the following must be satisfied. 

• High Availability Monitors 

• ServiceGuard 

• Network Addressing 

• crontab File 

Refer to the following sections for specific information. 

High Availability Monitors User Prerequisites 

The High Availability Monitors (hamon) user prerequisites are: 

• Create an hamon user login account on every node within the cluster. 

Use proper UID/GID and place the login in the /etc/passwd file. For 
example: 

hamon:password:UID:GID:Hamon User:/home/hamon:/usr/bin/ksh 


NOTE The UID/GID must be consistent across all of the systems. 


• Create a hamon user home directory on every node within the cluster. 
Use mkdir /home/hamon. 

• Mount /home/hamon to a drive with a designated 100MB disk space. 
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• Create a /home/hamon/. rhost file that contains every host name 
within the cluster. 

Each entry in the /home/hamon/ . rhosts file is in a single line (no 
continuation). For example: 

[hostname.domainname [username]] [# comment] 

Refer to the man page on . rhosts for additional information. 

ServiceGuard Prerequisite 

The ServiceGuard cluster must be configured and functioning prior to 
starting the setup of Foundation Monitor toolkit. 

Information about cluster status is stored in the status database which is 
maintained on each individual node within the cluster. Use the cmviewcl 
-v command to display the status information of the cluster. The output 
of the cluster status information on each node must be identical. 

Refer to the man page on cmviewcl and Managing MC/ServiceGuard 
(HP Part Number B3936-90026) for information on examples of cluster 
and package states. 


Network Prerequisites 

The following network criteria must be met prior to configuring the 
network information during the Foundation Monitor toolkit setup. 

• The IP addresses and hostnames of the clients, cluster nodes, 
network switches, and network routing switches illustrated in Figure 
0-2 must be resolved by the Domain Name Server. 

• The client’s IP address must all use the same subnetwork as the 
network Access layer switch. 

An IP address consists of a class number, a network number, a 
subnetwork number, and a host number. Table 0-7 lists the network 
class A IP network address of 15.144.133.122 and describes each 
number: 


IP Network Address Prerequisite 


Class Number 

Network 

Number 

Subnetwork 

Number 

Host Number 

15 

144 

133 

122 
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Cron Prerequisite 

Log on as root and add the following line to the root crontab file, 
/var/spool/cron/crontabs/root, on each node in the cluster. 

55 23 * * * /opt/cmcluster/toolkit/foundation/monitor cron 

To add a cron entry, perform the following: 

# crontab -1 > /tmp/rootcron 

# echo "55 23 * * * opt/cmcluster/toolkit/foundation/monitor 
cron" >> /tmp/rootcron 

# crontab /tmp/rootcron 


Running the Foundation Monitor Toolkit Setup Script 

Setup the Foundation Monitor toolkit to enable cluster availability 
tracking. 

To setup the Foundation Monitor toolkit: 

1. Log on as root on each node within the cluster. 

2. Execute the following script: 

#/opt/cmcluster/toolkit/foundation/monitor setup 
The setup script then prompts for information. 

3. The Foundation Monitor Toolkit measurement of availability 
is based on one year from the cluster production date. 

Is the cluster ready to go production: [yes] or [no]? n 

If the answer is Y, the script creates a /home/hamon/time_stamp file 
with a date of one year from today. 

If the answer is N, the script does not create a /home/hamon/ 
time_stamp file. This prompt appears every time the script is 
executed until a timestamp file is created. 


NOTE To reset the time_stamp file date, re-run the setup script. 


4. Please enter Access Layer Switch 1 IP Address: 15.8.232.1 
You entered 15.8.232.1. Is this correct: [yes] or [no]: y 

5. Please enter client 1 of Access Switch 1: 15.14.224.6 

You entered 15.14.224.6. Is this correct: [yes] or [no]? y 


Chapter 


19 





Installing and Configuring the Foundation Monitor Toolkit 
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6. Enter another client: [yes] or [no]: n 

If the answer is Y, the script prompts for another client IP address. 
This repeats until N is entered. 

7. Enter another Access Switch: [yes] or [no]: n 

If the answer is Y, the script prompts for an additional Access Layer 
Switch IP Address and repeats the steps 4, 5, and 6. 

If the answer is N, the script exits and setup is complete. 

Refer to the section “Sample Files” for additional information. 

Verifying the Foundation Monitor Toolkit Processes 

After the Foundation Monitor toolkit setup script has completed, run the 
following command to verify the monitor processes are running on every 
node within the cluster: 

#ps -ef | grep monitor | grep -v grep 

/opt/cmcluster/toolkit/foundation/monitor ./monitor time 
/opt/cmcluster/toolkit/foundation/monitor ./monitor netmon 
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Sample Files 

This sections provides the following samples: 

• Foundation Monitor Setup Output 

• Data Log File 

Foundation Monitor Setup Output 

#./monitor setup 

Checking root capabilities.. . . 

Checking monitor parameters .... 

The Foundation Monitor Toolkit measurement of availability is 
based on one year from the cluster production date. Is the 
cluster ready to go production: [yes] or [no]? n 

Checking cluster status.... 

Checking remote access.... 

Remote access to "thxll38" as "hamon" id(7692) suceeded. 
Checking /etc/inittab parameters.. . . 

Checking time stamp parameters.... 

Time stamp function already running, PID = 17811! 

Setting up cron.... 

Network setup section.... 

Please enter Access Layer Switch 1 IP Address: 15.8.232.1 
You entered 15.8.232.1. Is this correct: [yes] or [no]? y 

Please enter client 1 of Access Switch 1: 15.14.224.6 

You entered 15.14.224.6. Is this correct: [yes] or [no]? y 

Enter another client: [yes] or [no]? n 
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Enter another Access Switch: [yes] or [no]? n 
Starting Network Monitoring function.... 
Monitor setup Complete. 


Data Log File 

PKG:vpn_pkg:20395302:down:monk:Aug:24:1999:01:21:42:cmhaltserv 
PKG:ste_pkg:20395328:up:shorter:Aug:24:1999:01:22:08:cmrunserv 
SYS:20681165:20682006:shorter:down:Aug:28:1999:08:46:05: 
up : Aug: 28 :1999:09:00:06:0:0:14:1:"POWER" 


22 


Chapter 



